Skip to content

restrict rest workers based on open files limit#4108

Open
dtrawins wants to merge 11 commits intomainfrom
limit-workers
Open

restrict rest workers based on open files limit#4108
dtrawins wants to merge 11 commits intomainfrom
limit-workers

Conversation

@dtrawins
Copy link
Copy Markdown
Collaborator

@dtrawins dtrawins commented Apr 1, 2026

🛠 Summary

Default number of REST workers based only on the number of CPUs might exceed the number of open files limit. It can happen for example in Kubernetes with Xeon CPU with almost 400 CPU cores and ulimit 1024 files.
This change make the default number of REST workers safe in such conditions. It will include open files limit as criteria to allow initialization and open connections. The ration between rest worker could assumes open files for drongon, mediapipe graphs and client connections.
When the number of REST workers is set explicitly, and required open files number exceeds the limit, a clear error will be raised and OVMS will exit without attempt to load the model.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 1, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates OVMS REST worker configuration to account for Linux open-files limits, preventing overly large rest_workers values and adding regression tests.

Changes:

  • Compute a Linux-specific default for rest_workers based on RLIMIT_NOFILE and add a validation cap tied to open-files availability.
  • Add death tests validating behavior when the open-files limit is low.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/config.cpp Adds Linux open-files-based default/validation logic for rest_workers.
src/test/ovmsconfig_test.cpp Adds tests that manipulate RLIMIT_NOFILE to validate the new rest_workers constraints.

src/config.cpp Outdated
Comment on lines +50 to +54
// We need to minimize the number of default drogon workers since this value is set for both: unary and streaming (making it always double)
// on linux, restrict also based on the max allowed number of open files
#ifdef __linux__
#include <sys/resource.h>
const uint64_t MAX_OPEN_FILES = []() {
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <sys/resource.h> is placed inside namespace ovms (and mid-file). Including system headers inside a namespace can unintentionally put all their declarations/types into that namespace and create subtle lookup/type issues. Move this include up with the other includes at file scope (outside the namespace) and keep only the constants inside ovms.

Copilot uses AI. Check for mistakes.
src/config.cpp Outdated
Comment on lines +326 to +329
if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) {
std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux, this check subtracts RESERVED_OPEN_FILES from MAX_OPEN_FILES without guarding MAX_OPEN_FILES <= RESERVED_OPEN_FILES. With unsigned types this can underflow and effectively disable the limit check. Add a guard (or use a saturating subtraction) before computing (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5, and consider emitting a clear error when the open-files limit is too low to run REST workers at all.

Suggested change
if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) {
std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}
if (MAX_OPEN_FILES <= RESERVED_OPEN_FILES) {
std::cerr << "Open files limit (" << MAX_OPEN_FILES
<< ") is too low to run REST workers. Increase the RLIMIT_NOFILE soft limit "
<< "or adjust the server configuration." << std::endl;
return false;
}
const uint64_t maxRestWorkersByFiles = (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5;
if (restWorkers() > maxRestWorkersByFiles) {
std::cerr << "rest_workers count cannot be larger than " << maxRestWorkersByFiles
<< " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +20
#include <regex>
#include <sys/resource.h>
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<sys/resource.h> (and the new getrlimit/setrlimit-based tests) are Linux-specific, but the include is unconditional. This will break builds on non-Linux platforms. Wrap the include and these two new tests in #ifdef __linux__ (or equivalent platform guard).

Suggested change
#include <regex>
#include <sys/resource.h>
#include <regex>
#ifdef __linux__
#include <sys/resource.h>
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +212
TEST_F(OvmsConfigDeathTest, restWorkersDefaultReducedForOpenFilesLimit) {
// limit allowed number of open files to 1024 to make sure that rest_workers count is too large for the limit based on number of cpu cores alone
int cpu_cores = ovms::getCoreCount();
struct rlimit limit;
ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0);
struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max};
std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment says the open-files limit is set to 1024, but the code actually sets it to cpu_cores * 5. Please update the comment to match the behavior (or adjust the limit if 1024 is intended).

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +214
int cpu_cores = ovms::getCoreCount();
struct rlimit limit;
ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0);
struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max};
std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl;
ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &newLimit), 0);

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setrlimit(RLIMIT_NOFILE, ...) can fail if newLimit.rlim_cur exceeds limit.rlim_max (or if the process lacks permission to change limits). As written, cpu_cores * 5 might be > rlim_max on some systems/containers, making this test flaky. Consider clamping rlim_cur to limit.rlim_max and/or skipping the test with a clear message when the limit cannot be adjusted.

Copilot uses AI. Check for mistakes.
dtrawins and others added 4 commits April 2, 2026 09:18
return false;
}
#ifdef __linux__
if (restWorkers() > (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it 6 here, but 7 in getDefaultRestWorkers()?

if (maxOpenFiles <= RESERVED_OPEN_FILES) {
return static_cast<uint64_t>(2); // minimum functional number
}
return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add log that number of workers has been decreased due to limit of open files?

char* n_argv[] = {"ovms", "--config_path", "/path1", "--rest_port", "8080", "--port", "8081"};
int arg_count = 7;
ovms::Config::instance().parse(arg_count, n_argv);
EXPECT_TRUE(ovms::Config::instance().validate());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesnt check if number of workers has been changed

return false;
}
#ifdef __linux__
if (restWorkers() > (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is default rest workers also max possible rest workers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants